Overview

Dataset statistics

Number of variables20
Number of observations777715
Missing cells240048
Missing cells (%)1.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory140.7 MiB
Average record size in memory189.7 B

Variable types

Numeric7
Categorical11
Boolean2

Alerts

FLAG_MOBIL has constant value "1"Constant
CNT_CHILDREN is highly overall correlated with CNT_FAM_MEMBERSHigh correlation
DAYS_EMPLOYED is highly overall correlated with NAME_INCOME_TYPE and 1 other fieldsHigh correlation
CNT_FAM_MEMBERS is highly overall correlated with CNT_CHILDRENHigh correlation
CODE_GENDER is highly overall correlated with FLAG_OWN_CAR and 1 other fieldsHigh correlation
NAME_INCOME_TYPE is highly overall correlated with DAYS_BIRTH and 1 other fieldsHigh correlation
OCCUPATION_TYPE is highly overall correlated with CODE_GENDERHigh correlation
DAYS_BIRTH is highly overall correlated with NAME_INCOME_TYPE and 1 other fieldsHigh correlation
FLAG_OWN_CAR is highly overall correlated with CODE_GENDERHigh correlation
OCCUPATION_TYPE has 240048 (30.9%) missing valuesMissing
CNT_CHILDREN has 540639 (69.5%) zerosZeros
MONTHS_BALANCE has 24672 (3.2%) zerosZeros

Reproduction

Analysis started2023-03-05 22:51:23.499218
Analysis finished2023-03-05 22:53:23.005522
Duration1 minute and 59.51 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

ID
Real number (ℝ)

Distinct36457
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5078742.9
Minimum5008804
Maximum5150487
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.9 MiB
2023-03-05T17:53:23.159571image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum5008804
5-th percentile5018481
Q15044568.5
median5069530
Q35115551
95-th percentile5146052
Maximum5150487
Range141683
Interquartile range (IQR)70982.5

Descriptive statistics

Standard deviation41804.425
Coefficient of variation (CV)0.0082312543
Kurtosis-1.2068975
Mean5078742.9
Median Absolute Deviation (MAD)35960
Skewness0.073626481
Sum3.9498146 × 1012
Variance1.7476099 × 109
MonotonicityNot monotonic
2023-03-05T17:53:23.345523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5090630 61
 
< 0.1%
5148524 61
 
< 0.1%
5066707 61
 
< 0.1%
5061848 61
 
< 0.1%
5118380 61
 
< 0.1%
5112636 61
 
< 0.1%
5009106 61
 
< 0.1%
5099880 61
 
< 0.1%
5085886 61
 
< 0.1%
5045838 61
 
< 0.1%
Other values (36447) 777105
99.9%
ValueCountFrequency (%)
5008804 16
< 0.1%
5008805 15
 
< 0.1%
5008806 30
< 0.1%
5008808 5
 
< 0.1%
5008809 5
 
< 0.1%
5008810 27
< 0.1%
5008811 39
< 0.1%
5008812 17
< 0.1%
5008813 17
< 0.1%
5008814 17
< 0.1%
ValueCountFrequency (%)
5150487 30
< 0.1%
5150485 2
 
< 0.1%
5150484 13
 
< 0.1%
5150483 18
< 0.1%
5150482 18
< 0.1%
5150481 43
< 0.1%
5150480 26
< 0.1%
5150479 9
 
< 0.1%
5150478 14
 
< 0.1%
5150477 21
< 0.1%

CODE_GENDER
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
F
518851 
M
258864 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
F 518851
66.7%
M 258864
33.3%

Length

2023-03-05T17:53:23.531523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:23.706025image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
f 518851
66.7%
m 258864
33.3%

Most occurring characters

ValueCountFrequency (%)
F 518851
66.7%
M 258864
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 777715
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 518851
66.7%
M 258864
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 777715
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 518851
66.7%
M 258864
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 518851
66.7%
M 258864
33.3%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
False
473355 
True
304360 
ValueCountFrequency (%)
False 473355
60.9%
True 304360
39.1%
2023-03-05T17:53:23.848324image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.7 MiB
True
512948 
False
264767 
ValueCountFrequency (%)
True 512948
66.0%
False 264767
34.0%
2023-03-05T17:53:24.020242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

CNT_CHILDREN
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.42808227
Minimum0
Maximum19
Zeros540639
Zeros (%)69.5%
Negative0
Negative (%)0.0%
Memory size11.9 MiB
2023-03-05T17:53:24.140420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum19
Range19
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7457552
Coefficient of variation (CV)1.7420839
Kurtosis21.025355
Mean0.42808227
Median Absolute Deviation (MAD)0
Skewness2.5873102
Sum332926
Variance0.55615083
MonotonicityNot monotonic
2023-03-05T17:53:24.280983image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0 540639
69.5%
1 155638
 
20.0%
2 70399
 
9.1%
3 9328
 
1.2%
4 1224
 
0.2%
5 324
 
< 0.1%
14 111
 
< 0.1%
7 46
 
< 0.1%
19 6
 
< 0.1%
ValueCountFrequency (%)
0 540639
69.5%
1 155638
 
20.0%
2 70399
 
9.1%
3 9328
 
1.2%
4 1224
 
0.2%
5 324
 
< 0.1%
7 46
 
< 0.1%
14 111
 
< 0.1%
19 6
 
< 0.1%
ValueCountFrequency (%)
19 6
 
< 0.1%
14 111
 
< 0.1%
7 46
 
< 0.1%
5 324
 
< 0.1%
4 1224
 
0.2%
3 9328
 
1.2%
2 70399
 
9.1%
1 155638
 
20.0%
0 540639
69.5%

AMT_INCOME_TOTAL
Real number (ℝ)

Distinct265
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean188534.8
Minimum27000
Maximum1575000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.9 MiB
2023-03-05T17:53:24.484108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum27000
5-th percentile76500
Q1121500
median162000
Q3225000
95-th percentile360000
Maximum1575000
Range1548000
Interquartile range (IQR)103500

Descriptive statistics

Standard deviation101622.45
Coefficient of variation (CV)0.53901163
Kurtosis15.804592
Mean188534.8
Median Absolute Deviation (MAD)49500
Skewness2.5776417
Sum1.4662634 × 1011
Variance1.0327122 × 1010
MonotonicityNot monotonic
2023-03-05T17:53:24.687323image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
135000 90217
 
11.6%
180000 68579
 
8.8%
157500 62686
 
8.1%
112500 61622
 
7.9%
225000 61399
 
7.9%
202500 47707
 
6.1%
270000 37222
 
4.8%
90000 36337
 
4.7%
315000 23136
 
3.0%
67500 18822
 
2.4%
Other values (255) 269988
34.7%
ValueCountFrequency (%)
27000 78
 
< 0.1%
29250 44
 
< 0.1%
30150 79
 
< 0.1%
31500 240
< 0.1%
31531.5 65
 
< 0.1%
31950 7
 
< 0.1%
32400 32
 
< 0.1%
33300 323
< 0.1%
33750 8
 
< 0.1%
36000 144
< 0.1%
ValueCountFrequency (%)
1575000 150
 
< 0.1%
1350000 102
 
< 0.1%
1125000 83
 
< 0.1%
990000 26
 
< 0.1%
945000 48
 
< 0.1%
900000 844
0.1%
810000 317
 
< 0.1%
787500 42
 
< 0.1%
765000 122
 
< 0.1%
742500 100
 
< 0.1%

NAME_INCOME_TYPE
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
Working
400164 
Commercial associate
183385 
Pensioner
128392 
State servant
65437 
Student
 
337

Length

Max length20
Median length7
Mean length10.900415
Min length7

Characters and Unicode

Total characters8477416
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWorking
2nd rowWorking
3rd rowWorking
4th rowWorking
5th rowWorking

Common Values

ValueCountFrequency (%)
Working 400164
51.5%
Commercial associate 183385
23.6%
Pensioner 128392
 
16.5%
State servant 65437
 
8.4%
Student 337
 
< 0.1%

Length

2023-03-05T17:53:24.906097image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:25.218510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
working 400164
39.0%
commercial 183385
17.9%
associate 183385
17.9%
pensioner 128392
 
12.5%
state 65437
 
6.4%
servant 65437
 
6.4%
student 337
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
i 895326
10.6%
o 895326
10.6%
r 777378
 
9.2%
e 754765
 
8.9%
n 722722
 
8.5%
a 681029
 
8.0%
s 560599
 
6.6%
W 400164
 
4.7%
k 400164
 
4.7%
g 400164
 
4.7%
Other values (11) 1989779
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7450879
87.9%
Uppercase Letter 777715
 
9.2%
Space Separator 248822
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 895326
12.0%
o 895326
12.0%
r 777378
10.4%
e 754765
10.1%
n 722722
9.7%
a 681029
9.1%
s 560599
7.5%
k 400164
5.4%
g 400164
5.4%
t 380370
 
5.1%
Other values (6) 983036
13.2%
Uppercase Letter
ValueCountFrequency (%)
W 400164
51.5%
C 183385
23.6%
P 128392
 
16.5%
S 65774
 
8.5%
Space Separator
ValueCountFrequency (%)
248822
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8228594
97.1%
Common 248822
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 895326
10.9%
o 895326
10.9%
r 777378
9.4%
e 754765
9.2%
n 722722
8.8%
a 681029
 
8.3%
s 560599
 
6.8%
W 400164
 
4.9%
k 400164
 
4.9%
g 400164
 
4.9%
Other values (10) 1740957
21.2%
Common
ValueCountFrequency (%)
248822
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8477416
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 895326
10.6%
o 895326
10.6%
r 777378
 
9.2%
e 754765
 
8.9%
n 722722
 
8.5%
a 681029
 
8.0%
s 560599
 
6.6%
W 400164
 
4.7%
k 400164
 
4.7%
g 400164
 
4.7%
Other values (11) 1989779
23.5%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
Secondary / secondary special
524261 
Higher education
213633 
Incomplete higher
 
30329
Lower secondary
 
8655
Academic degree
 
837

Length

Max length29
Median length29
Mean length24.790148
Min length15

Characters and Unicode

Total characters19279670
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigher education
2nd rowHigher education
3rd rowHigher education
4th rowHigher education
5th rowHigher education

Common Values

ValueCountFrequency (%)
Secondary / secondary special 524261
67.4%
Higher education 213633
27.5%
Incomplete higher 30329
 
3.9%
Lower secondary 8655
 
1.1%
Academic degree 837
 
0.1%

Length

2023-03-05T17:53:25.406107image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:25.577880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
secondary 1057177
40.6%
524261
20.1%
special 524261
20.1%
higher 243962
 
9.4%
education 213633
 
8.2%
incomplete 30329
 
1.2%
lower 8655
 
0.3%
academic 837
 
< 0.1%
degree 837
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 2111694
11.0%
c 1827074
9.5%
1826237
9.5%
a 1795908
9.3%
r 1310631
 
6.8%
o 1309794
 
6.8%
n 1301139
 
6.7%
d 1272484
 
6.6%
y 1057177
 
5.5%
s 1057177
 
5.5%
Other values (15) 4410355
22.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16151457
83.8%
Space Separator 1826237
 
9.5%
Uppercase Letter 777715
 
4.0%
Other Punctuation 524261
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2111694
13.1%
c 1827074
11.3%
a 1795908
11.1%
r 1310631
8.1%
o 1309794
8.1%
n 1301139
8.1%
d 1272484
7.9%
y 1057177
6.5%
s 1057177
6.5%
i 982693
6.1%
Other values (8) 2125686
13.2%
Uppercase Letter
ValueCountFrequency (%)
S 524261
67.4%
H 213633
27.5%
I 30329
 
3.9%
L 8655
 
1.1%
A 837
 
0.1%
Space Separator
ValueCountFrequency (%)
1826237
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 524261
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16929172
87.8%
Common 2350498
 
12.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2111694
12.5%
c 1827074
10.8%
a 1795908
10.6%
r 1310631
7.7%
o 1309794
7.7%
n 1301139
7.7%
d 1272484
7.5%
y 1057177
 
6.2%
s 1057177
 
6.2%
i 982693
 
5.8%
Other values (13) 2903401
17.2%
Common
ValueCountFrequency (%)
1826237
77.7%
/ 524261
 
22.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19279670
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2111694
11.0%
c 1827074
9.5%
1826237
9.5%
a 1795908
9.3%
r 1310631
 
6.8%
o 1309794
 
6.8%
n 1301139
 
6.7%
d 1272484
 
6.6%
y 1057177
 
5.5%
s 1057177
 
5.5%
Other values (15) 4410355
22.9%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
Married
546619 
Single / not married
94335 
Civil marriage
60342 
Separated
 
45255
Widow
 
31164

Length

Max length20
Median length7
Mean length9.1562282
Min length5

Characters and Unicode

Total characters7120936
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCivil marriage
2nd rowCivil marriage
3rd rowCivil marriage
4th rowCivil marriage
5th rowCivil marriage

Common Values

ValueCountFrequency (%)
Married 546619
70.3%
Single / not married 94335
 
12.1%
Civil marriage 60342
 
7.8%
Separated 45255
 
5.8%
Widow 31164
 
4.0%

Length

2023-03-05T17:53:25.765353image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:25.937269image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
married 640954
57.2%
single 94335
 
8.4%
94335
 
8.4%
not 94335
 
8.4%
civil 60342
 
5.4%
marriage 60342
 
5.4%
separated 45255
 
4.0%
widow 31164
 
2.8%

Most occurring characters

ValueCountFrequency (%)
r 1447847
20.3%
i 947479
13.3%
e 886141
12.4%
a 852148
12.0%
d 717373
10.1%
M 546619
 
7.7%
343347
 
4.8%
n 188670
 
2.6%
g 154677
 
2.2%
l 154677
 
2.2%
Other values (10) 881958
12.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5905539
82.9%
Uppercase Letter 777715
 
10.9%
Space Separator 343347
 
4.8%
Other Punctuation 94335
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1447847
24.5%
i 947479
16.0%
e 886141
15.0%
a 852148
14.4%
d 717373
12.1%
n 188670
 
3.2%
g 154677
 
2.6%
l 154677
 
2.6%
m 154677
 
2.6%
t 139590
 
2.4%
Other values (4) 262260
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
M 546619
70.3%
S 139590
 
17.9%
C 60342
 
7.8%
W 31164
 
4.0%
Space Separator
ValueCountFrequency (%)
343347
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 94335
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6683254
93.9%
Common 437682
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1447847
21.7%
i 947479
14.2%
e 886141
13.3%
a 852148
12.8%
d 717373
10.7%
M 546619
 
8.2%
n 188670
 
2.8%
g 154677
 
2.3%
l 154677
 
2.3%
m 154677
 
2.3%
Other values (8) 632946
9.5%
Common
ValueCountFrequency (%)
343347
78.4%
/ 94335
 
21.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7120936
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 1447847
20.3%
i 947479
13.3%
e 886141
12.4%
a 852148
12.0%
d 717373
10.1%
M 546619
 
7.7%
343347
 
4.8%
n 188670
 
2.6%
g 154677
 
2.2%
l 154677
 
2.2%
Other values (10) 881958
12.4%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
House / apartment
697151 
With parents
 
35735
Municipal apartment
 
24640
Rented apartment
 
10898
Office apartment
 
5636

Length

Max length19
Median length17
Mean length16.802963
Min length12

Characters and Unicode

Total characters13067916
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRented apartment
2nd rowRented apartment
3rd rowRented apartment
4th rowRented apartment
5th rowRented apartment

Common Values

ValueCountFrequency (%)
House / apartment 697151
89.6%
With parents 35735
 
4.6%
Municipal apartment 24640
 
3.2%
Rented apartment 10898
 
1.4%
Office apartment 5636
 
0.7%
Co-op apartment 3655
 
0.5%

Length

2023-03-05T17:53:26.124849image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:26.312347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
apartment 741980
32.9%
house 697151
30.9%
697151
30.9%
with 35735
 
1.6%
parents 35735
 
1.6%
municipal 24640
 
1.1%
rented 10898
 
0.5%
office 5636
 
0.3%
co-op 3655
 
0.2%

Most occurring characters

ValueCountFrequency (%)
t 1566328
12.0%
a 1544335
11.8%
e 1502298
11.5%
1474866
11.3%
n 813253
 
6.2%
p 806010
 
6.2%
r 777715
 
6.0%
m 741980
 
5.7%
s 732886
 
5.6%
u 721791
 
5.5%
Other values (15) 2386454
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10114529
77.4%
Space Separator 1474866
 
11.3%
Uppercase Letter 777715
 
6.0%
Other Punctuation 697151
 
5.3%
Dash Punctuation 3655
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 1566328
15.5%
a 1544335
15.3%
e 1502298
14.9%
n 813253
8.0%
p 806010
8.0%
r 777715
7.7%
m 741980
7.3%
s 732886
7.2%
u 721791
7.1%
o 704461
7.0%
Other values (6) 203472
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
H 697151
89.6%
W 35735
 
4.6%
M 24640
 
3.2%
R 10898
 
1.4%
O 5636
 
0.7%
C 3655
 
0.5%
Space Separator
ValueCountFrequency (%)
1474866
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 697151
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3655
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10892244
83.4%
Common 2175672
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 1566328
14.4%
a 1544335
14.2%
e 1502298
13.8%
n 813253
7.5%
p 806010
7.4%
r 777715
7.1%
m 741980
6.8%
s 732886
6.7%
u 721791
6.6%
o 704461
6.5%
Other values (12) 981187
9.0%
Common
ValueCountFrequency (%)
1474866
67.8%
/ 697151
32.0%
- 3655
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13067916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 1566328
12.0%
a 1544335
11.8%
e 1502298
11.5%
1474866
11.3%
n 813253
 
6.2%
p 806010
 
6.2%
r 777715
 
6.0%
m 741980
 
5.7%
s 732886
 
5.6%
u 721791
 
5.5%
Other values (15) 2386454
18.3%

DAYS_BIRTH
Real number (ℝ)

Distinct7183
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-16124.937
Minimum-25152
Maximum-7489
Zeros0
Zeros (%)0.0%
Negative777715
Negative (%)100.0%
Memory size11.9 MiB
2023-03-05T17:53:26.499856image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-25152
5-th percentile-23015
Q1-19453
median-15760
Q3-12716
95-th percentile-10048
Maximum-7489
Range17663
Interquartile range (IQR)6737

Descriptive statistics

Standard deviation4104.304
Coefficient of variation (CV)-0.25453148
Kurtosis-1.0227322
Mean-16124.937
Median Absolute Deviation (MAD)3327
Skewness-0.17693133
Sum-1.2540605 × 1010
Variance16845311
MonotonicityNot monotonic
2023-03-05T17:53:26.687351image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-14667 1018
 
0.1%
-15140 928
 
0.1%
-15675 835
 
0.1%
-15519 799
 
0.1%
-16995 799
 
0.1%
-13788 796
 
0.1%
-12483 793
 
0.1%
-14636 787
 
0.1%
-13300 784
 
0.1%
-12569 720
 
0.1%
Other values (7173) 769456
98.9%
ValueCountFrequency (%)
-25152 84
< 0.1%
-25140 112
< 0.1%
-25099 32
 
< 0.1%
-25088 32
 
< 0.1%
-25010 21
 
< 0.1%
-24970 47
 
< 0.1%
-24963 17
 
< 0.1%
-24946 97
< 0.1%
-24932 146
< 0.1%
-24914 129
< 0.1%
ValueCountFrequency (%)
-7489 1
 
< 0.1%
-7705 5
 
< 0.1%
-7723 2
 
< 0.1%
-7757 52
< 0.1%
-7959 13
 
< 0.1%
-7980 17
 
< 0.1%
-8041 78
< 0.1%
-8054 4
 
< 0.1%
-8056 25
 
< 0.1%
-8067 13
 
< 0.1%

DAYS_EMPLOYED
Real number (ℝ)

Distinct3640
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57775.825
Minimum-15713
Maximum365243
Zeros0
Zeros (%)0.0%
Negative649743
Negative (%)83.5%
Memory size11.9 MiB
2023-03-05T17:53:26.906100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-15713
5-th percentile-7369
Q1-3292
median-1682
Q3-431
95-th percentile365243
Maximum365243
Range380956
Interquartile range (IQR)2861

Descriptive statistics

Standard deviation136471.74
Coefficient of variation (CV)2.3620906
Kurtosis1.2722827
Mean57775.825
Median Absolute Deviation (MAD)1379
Skewness1.808405
Sum4.4933126 × 1010
Variance1.8624535 × 1010
MonotonicityNot monotonic
2023-03-05T17:53:27.093475image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
365243 127972
 
16.5%
-1751 1601
 
0.2%
-1539 1545
 
0.2%
-401 1498
 
0.2%
-2531 1319
 
0.2%
-108 1319
 
0.2%
-200 1221
 
0.2%
-1812 1219
 
0.2%
-1678 1179
 
0.2%
-2087 1176
 
0.2%
Other values (3630) 637666
82.0%
ValueCountFrequency (%)
-15713 17
 
< 0.1%
-15661 116
 
< 0.1%
-15227 18
 
< 0.1%
-15072 33
 
< 0.1%
-15038 544
0.1%
-14887 186
 
< 0.1%
-14810 296
< 0.1%
-14775 58
 
< 0.1%
-14536 160
 
< 0.1%
-14473 204
 
< 0.1%
ValueCountFrequency (%)
365243 127972
16.5%
-17 33
 
< 0.1%
-43 29
 
< 0.1%
-65 45
 
< 0.1%
-66 21
 
< 0.1%
-70 45
 
< 0.1%
-71 18
 
< 0.1%
-73 374
 
< 0.1%
-78 8
 
< 0.1%
-79 27
 
< 0.1%

FLAG_MOBIL
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
1
777715 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 777715
100.0%

Length

2023-03-05T17:53:27.281067image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:27.437358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1 777715
100.0%

Most occurring characters

ValueCountFrequency (%)
1 777715
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 777715
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 777715
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 777715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 777715
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 777715
100.0%

FLAG_WORK_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
0
597427 
1
180288 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

Length

2023-03-05T17:53:27.562347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:27.718608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

Most occurring characters

ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 777715
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

Most occurring scripts

ValueCountFrequency (%)
Common 777715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 597427
76.8%
1 180288
 
23.2%

FLAG_PHONE
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
0
543650 
1
234065 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

Length

2023-03-05T17:53:27.843538image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:28.015476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

Most occurring characters

ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 777715
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

Most occurring scripts

ValueCountFrequency (%)
Common 777715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 543650
69.9%
1 234065
30.1%

FLAG_EMAIL
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
0
706418 
1
71297 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

Length

2023-03-05T17:53:28.140488image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:28.312279image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

Most occurring characters

ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 777715
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
Common 777715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 706418
90.8%
1 71297
 
9.2%

OCCUPATION_TYPE
Categorical

HIGH CORRELATION
MISSING

Distinct18
Distinct (%)< 0.1%
Missing240048
Missing (%)30.9%
Memory size11.9 MiB
Laborers
131572 
Core staff
77112 
Sales staff
70362 
Managers
67738 
Drivers
47678 
Other values (13)
143205 

Length

Max length21
Median length20
Mean length10.515029
Min length7

Characters and Unicode

Total characters5653584
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSecurity staff
2nd rowSecurity staff
3rd rowSecurity staff
4th rowSecurity staff
5th rowSecurity staff

Common Values

ValueCountFrequency (%)
Laborers 131572
16.9%
Core staff 77112
 
9.9%
Sales staff 70362
 
9.0%
Managers 67738
 
8.7%
Drivers 47678
 
6.1%
High skill tech staff 31768
 
4.1%
Accountants 27223
 
3.5%
Medicine staff 26691
 
3.4%
Cooking staff 13416
 
1.7%
Security staff 12400
 
1.6%
Other values (8) 31707
 
4.1%
(Missing) 240048
30.9%

Length

2023-03-05T17:53:28.452987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
staff 255424
29.4%
laborers 135195
15.6%
core 77112
 
8.9%
sales 70362
 
8.1%
managers 67738
 
7.8%
drivers 47678
 
5.5%
high 31768
 
3.7%
skill 31768
 
3.7%
tech 31768
 
3.7%
accountants 27223
 
3.1%
Other values (13) 92188
 
10.6%

Most occurring characters

ValueCountFrequency (%)
s 652691
11.5%
a 652576
11.5%
r 547836
9.7%
e 544257
 
9.6%
f 510848
 
9.0%
t 368978
 
6.5%
330557
 
5.8%
o 269985
 
4.8%
i 224568
 
4.0%
n 188906
 
3.3%
Other values (26) 1362382
24.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4772552
84.4%
Uppercase Letter 544295
 
9.6%
Space Separator 330557
 
5.8%
Dash Punctuation 3623
 
0.1%
Other Punctuation 2557
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 652691
13.7%
a 652576
13.7%
r 547836
11.5%
e 544257
11.4%
f 510848
10.7%
t 368978
7.7%
o 269985
5.7%
i 224568
 
4.7%
n 188906
 
4.0%
l 153803
 
3.2%
Other values (11) 658104
13.8%
Uppercase Letter
ValueCountFrequency (%)
L 138818
25.5%
C 101927
18.7%
M 94429
17.3%
S 85911
15.8%
D 47678
 
8.8%
H 33454
 
6.1%
A 27223
 
5.0%
P 6714
 
1.2%
R 2946
 
0.5%
W 2557
 
0.5%
Other values (2) 2638
 
0.5%
Space Separator
ValueCountFrequency (%)
330557
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3623
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2557
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5316847
94.0%
Common 336737
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 652691
12.3%
a 652576
12.3%
r 547836
10.3%
e 544257
10.2%
f 510848
9.6%
t 368978
 
6.9%
o 269985
 
5.1%
i 224568
 
4.2%
n 188906
 
3.6%
l 153803
 
2.9%
Other values (23) 1202399
22.6%
Common
ValueCountFrequency (%)
330557
98.2%
- 3623
 
1.1%
/ 2557
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5653584
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 652691
11.5%
a 652576
11.5%
r 547836
9.7%
e 544257
 
9.6%
f 510848
 
9.0%
t 368978
 
6.5%
330557
 
5.8%
o 269985
 
4.8%
i 224568
 
4.0%
n 188906
 
3.3%
Other values (26) 1362382
24.1%

CNT_FAM_MEMBERS
Real number (ℝ)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.2088374
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.9 MiB
2023-03-05T17:53:28.593598image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum20
Range19
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.90737972
Coefficient of variation (CV)0.41079516
Kurtosis7.7223182
Mean2.2088374
Median Absolute Deviation (MAD)0
Skewness1.3241752
Sum1717846
Variance0.82333796
MonotonicityNot monotonic
2023-03-05T17:53:28.718602image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2 423723
54.5%
1 141477
 
18.2%
3 134894
 
17.3%
4 66990
 
8.6%
5 8999
 
1.2%
6 1196
 
0.2%
7 273
 
< 0.1%
15 111
 
< 0.1%
9 46
 
< 0.1%
20 6
 
< 0.1%
ValueCountFrequency (%)
1 141477
 
18.2%
2 423723
54.5%
3 134894
 
17.3%
4 66990
 
8.6%
5 8999
 
1.2%
6 1196
 
0.2%
7 273
 
< 0.1%
9 46
 
< 0.1%
15 111
 
< 0.1%
20 6
 
< 0.1%
ValueCountFrequency (%)
20 6
 
< 0.1%
15 111
 
< 0.1%
9 46
 
< 0.1%
7 273
 
< 0.1%
6 1196
 
0.2%
5 8999
 
1.2%
4 66990
 
8.6%
3 134894
 
17.3%
2 423723
54.5%
1 141477
 
18.2%

MONTHS_BALANCE
Real number (ℝ)

Distinct61
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-19.373564
Minimum-60
Maximum0
Zeros24672
Zeros (%)3.2%
Negative753043
Negative (%)96.8%
Memory size11.9 MiB
2023-03-05T17:53:28.890479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-60
5-th percentile-46
Q1-29
median-17
Q3-8
95-th percentile-1
Maximum0
Range60
Interquartile range (IQR)21

Descriptive statistics

Standard deviation14.082208
Coefficient of variation (CV)-0.72687753
Kurtosis-0.51805712
Mean-19.373564
Median Absolute Deviation (MAD)10
Skewness-0.59867375
Sum-15067111
Variance198.30858
MonotonicityNot monotonic
2023-03-05T17:53:29.077983image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 24963
 
3.2%
-2 24871
 
3.2%
0 24672
 
3.2%
-3 24644
 
3.2%
-4 24274
 
3.1%
-5 23899
 
3.1%
-6 23473
 
3.0%
-7 23018
 
3.0%
-8 22494
 
2.9%
-9 22090
 
2.8%
Other values (51) 539317
69.3%
ValueCountFrequency (%)
-60 321
 
< 0.1%
-59 627
 
0.1%
-58 955
 
0.1%
-57 1253
 
0.2%
-56 1588
0.2%
-55 1939
0.2%
-54 2279
0.3%
-53 2633
0.3%
-52 3070
0.4%
-51 3514
0.5%
ValueCountFrequency (%)
0 24672
3.2%
-1 24963
3.2%
-2 24871
3.2%
-3 24644
3.2%
-4 24274
3.1%
-5 23899
3.1%
-6 23473
3.0%
-7 23018
3.0%
-8 22494
2.9%
-9 22090
2.8%

STATUS
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size11.9 MiB
C
329536 
0
290654 
X
145950 
1
 
8747
5
 
1527
Other values (3)
 
1301

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters777715
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowC
4th rowC
5th rowC

Common Values

ValueCountFrequency (%)
C 329536
42.4%
0 290654
37.4%
X 145950
18.8%
1 8747
 
1.1%
5 1527
 
0.2%
2 801
 
0.1%
3 286
 
< 0.1%
4 214
 
< 0.1%

Length

2023-03-05T17:53:29.259829image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-05T17:53:29.447448image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
c 329536
42.4%
0 290654
37.4%
x 145950
18.8%
1 8747
 
1.1%
5 1527
 
0.2%
2 801
 
0.1%
3 286
 
< 0.1%
4 214
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
C 329536
42.4%
0 290654
37.4%
X 145950
18.8%
1 8747
 
1.1%
5 1527
 
0.2%
2 801
 
0.1%
3 286
 
< 0.1%
4 214
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 475486
61.1%
Decimal Number 302229
38.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 290654
96.2%
1 8747
 
2.9%
5 1527
 
0.5%
2 801
 
0.3%
3 286
 
0.1%
4 214
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
C 329536
69.3%
X 145950
30.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 475486
61.1%
Common 302229
38.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 290654
96.2%
1 8747
 
2.9%
5 1527
 
0.5%
2 801
 
0.3%
3 286
 
0.1%
4 214
 
0.1%
Latin
ValueCountFrequency (%)
C 329536
69.3%
X 145950
30.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 329536
42.4%
0 290654
37.4%
X 145950
18.8%
1 8747
 
1.1%
5 1527
 
0.2%
2 801
 
0.1%
3 286
 
< 0.1%
4 214
 
< 0.1%

Interactions

2023-03-05T17:53:13.692141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:53.537990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:57.181959image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:00.343620image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:03.773094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:07.095094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:10.425094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:14.202135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:54.066995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:57.610632image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:00.828633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:04.258099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:07.536095image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:10.870095image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:14.698757image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:54.549989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:58.056611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:01.316580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:04.730094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:07.988099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:11.325150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:15.192751image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:55.047997image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:58.511623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:01.820620image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:05.200094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:08.449096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:11.786138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:15.711748image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:55.739951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:58.963583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:02.317104image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:05.667094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:09.016094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:12.232140image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:16.251754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:56.226026image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:59.408612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:02.812143image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:06.134095image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:09.477098image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:12.674142image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:16.775754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:56.731995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:52:59.864615image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:03.314142image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:06.635094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:09.957095image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-05T17:53:13.166142image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2023-03-05T17:53:29.634877image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-03-05T17:53:29.978620image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-05T17:53:30.260310image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-05T17:53:30.529206image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-05T17:53:30.810002image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-03-05T17:53:31.102571image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-05T17:53:18.094754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-05T17:53:20.311960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_BIRTHDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSMONTHS_BALANCESTATUS
05008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN20C
15008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-1C
25008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-2C
35008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-3C
45008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-4C
55008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-5C
65008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-6C
75008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-7C
85008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-8C
95008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2-9C
IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_BIRTHDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERSMONTHS_BALANCESTATUS
7777055150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-40
7777065150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-50
7777075150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-60
7777085150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-70
7777095150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-80
7777105150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-90
7777115150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-102
7777125150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-111
7777135150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-120
7777145150337MNY0112500.0WorkingSecondary / secondary specialSingle / not marriedRented apartment-9188-11931000Laborers1-130